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ABSTRACT 



A revie^^; of discrete pursuer-evader games and known 
solutions is presented. A method is given for obtaining a 
finite memory, near-optimal evader strategy for the three- 
step game, which greatly reduces data storage requirements 
from previous near-optimal strategies. Additionally near- 
optimal evader strategies for the four-step game are 
discussed . 
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I. INTRODUCTION 



The discrete time step pursuer- evader game was first 

described by Rufus Isaacs of the Rand Corporation in the 

early 1950's in an attempt to look at the problem of 

attacking a moving target who is maneuvering so as to 

confound the prediction of his future position. The general 

problem, as described by Isaacs is as follows: 

A battleship in raidocean is aware of an enemy bomber’s 
presence, but the plane is too high for precise 
detection. The ship is interested only in not being 
hit; it has no offensive means. The plane has one bomb 
and we suppose--to avoid extraneous factors--that the 
bomber's aim is excellent. The battleship knows this, 
but knows nothing about when or where the bomb will be 
dropped until after detonation. It is to maneuver so 
as to minimize the hit probability. . . There is a time 

lag T hetX'jeen the bomber's last sighting of the ship and 
detonation. Thus the bomber must aim at an anticipated 
position of the ship . . . As simple as this problem 

sounds circumstantially, it is difficult technically. 

To gain a foothold, we simplified it further. We made 
the ocean one-dimensional and discrete. That is, we 
supposed the battleship to be located on one of a long 
row of points and at each unit of time he hops to one 
adjoining one, enjoying the sole choice of a right or 
left jump. The time lag was to be an integral number n 
of time units, or--the same thing--of jumps. This is 
tantamount to saying that the bomber knows all positions 
of the battleship which precede his present one by n 
jumps or more Ref.Clj. 

The solution to the single time step game, (i.e. n=l ) is 
trivial but the complexity increases greatly as the time lag 
or number of time steps increases. Isaacs, upon formulating 
the game, proposed pursuer and evader strategies to the two- 
step game, however the proof of the optimality of these 
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strategies is highly complex. The complexity of the multiple 
step games arises from the fact that the evader doesn't know 
when the pursuer will attack; if he did it would be an easy 
matter for the evader to distribute himself uniformly over 
the n+1 possible positions at the time of detonation, and 
limit the pursuer to a kill probability of l/(n+l). 

Without knowing the time of attack the evader must attempt 
to make his position uniform at every time step and this is 
not possible. 

The three-step pursuer-evader game is yet unsolved, 
however near-optimal strategies for both the pursuer and 
evader have been described. The best existing evader 
strategy, developed by Joseph 3ram Ref. [2], involves the 
evader maintaining an infinite memory of probabilities 
corresponding to the probability of turning given the evader 
has not turned for the last k moves. This thesis will 
investigate alternative finite evader strategies to attempt 
to lower the existing upper bound on the three-step game 
value while drastically reducing memory requirements and 
additionally look briefly at possible evader strategies in 
the four-step game. 
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II . KMOl-ffl SOLUTIONS AND STRATEGIES F.OR PURSUER-EVADER GAMES 



A. STRUCTURE 

For uniformity, the convention and structure described 
below will be used hereafter in the description of all 
discrete n-step pursuer-evader games. The pursuer is the 
maximizing player who by selection of time of fire and aim 
point tries to maximize the probability of killing the 
evader (a kill is achieved when the pursuer fires at the 
position the evader subsequently occupies n time steps 
later). The evader is the minimizing player, xvrho by selec- 
tion of maneuvers along the discrete linear state space, 
attempts to minimize the probability of being killed. The 
evader's maneuvers can be described as a sequence of lefts 
and rights (L and R) with each n-bit sequence of L's and 
R's corresponding to one of the n+1 final positions 
achievable in n steos from an arbitrary starting oosition as 
shown in Figure 2.1. The above-described mapping of n-bit 
left-right sequences to final position is symmetric under 
interchange of L's and R's (i.e. LLR corresponds to a sym- 
metric position to RRL in the three-step case). Due to this 
symmetry it is equivalent to describe the evader's maneuvers 
as a sequence of straights and turns (S and T which provides 
an equivalent mapping in Figure 2.2. A turn signifies the 
evader moves in the opposite direction to his previous move 
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Figure 2.1 Possible Evader Positions in n Steps. 
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and a straight signifies he continues in the same direction 
as his previous move. Any n-bit sequence of lefts and rights 
can be translated into an equivalent (n-1) bit sequence of 
straights and turns (i.e. LRRL becomes TST). Note that in 
general there may be several possible sequences of turns and 
straights which lead to the same final position (for n=3. 

TST, TTT, and STS all result in the evader occupying the 
position one step to the left of his original position) . 



B. ONE- STEP GAME 

The single step pur suer- evader game has a simple 
solution. With only one time step elapsing bet\>reen the 
pursuer's time of fire and weapon detonation the evader can 
always distribute himself uniformly over the two positions 
achievable in one step shown in Figure 2.3. The evader on 
each step can continue straight with probability (T-p) or 
turn with probability p. Since the intelligent pursuer will 
limit his shot to one of the two feasible positions of the 
evader v/hen he fires (position 1 or 2 of Figure 2.3), the 
game can be represented graphically as shown in Figure 2.4. 
The minimax solution occurs when p=0.5. The corresponding 
value of the game is 0.5'. The optimal evader strategy is to 
fire at position 1 or 2 with equal probability. 



C. TWO-STEP GAME 

The two-step pursuer-evader game is not nearly as simple 
in its solution as the one-step game. The solution was 
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Figure 2.4 Graphical Solution to the One-Step Game. 
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found by starting with the hypothesis that the evader's 
maneuver will depend only on his previous maneuver and none 
earlier; thus the probability of continuing in the same 
direction as the last move is denoted by (l-p), with p being 
■''he probability of moving in the opposite direction to the 
previous move. The attainable positions for the evader and 
the corresponding probabilities under the above hypothesis 
are shown in Figure 2.5. The pursuer can be expected to 
select the position (1, 2 or 3) with the highest associated 
probability. The evader will select p so as to minimize 
this maximum probability. The optimal value of p is then 
found by solving: 

min MAX {p-p^, p, (l-p)^}J 
P 

s . t . O^p^l . 0 

Graphically the solution is shown in Figure 2.6. The 
resulting solution is found by solving the quadratic p=(1-p)^ 
which has a root at p=(3-/5)/2 = 0.38197 . . . ; this value 
is also the probability that the evader is in position 2 or 
3 of Figure 2.5 and thus the value of the game. The proof 
that this evader strategy is optimal and that (3-/5’)/2 is 
the value of the game is complex. Three different proofs are 
given by Dubins Ref. [3] , Isaacs Ref.(]4] and Ferguson 
Ref.[5j. The pursuer strategies in the multi-step games 
are characterized by the non-existence of an optimal 
strategy; the pursuer can always increase his expected 
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kill probability by waiting a few more time periods but he 
cannot wait indefinitely to fire or his payoff is zero. 

This contradiction leads to strategies for the pursuer v;hich 
have payoffs arbitrarily close to, but not equal to, the 
value of the game, Ferguson developed such a pursuer 
strategy which confirmed that (3-/5”)/2 = 0.38197 . . . was 
the value of the two-step game. 

D. THREE- STEP GAME 

As stated earlier the three-step pursuer-evader game is 
yet unsolved. The value of the three-step game has been 
bounded to: 



0.28423 < V < 0.28903 

by Bram. This section will investigate previous near- 
optimal evader strategies for the three-step game and the 
resulting upper bounds upon the game value. 

1 , Markov Hypothesis Strategy 

The Markov Hypothesis for the n-step pursuer- evader 
game is stated as follows: the probability that the evader 

will go left or right (or, straight or turn) is dependent on 
the previous n-1 moves but not on any moves further in the 
past than the n-lst. This form or evader strategy makes 
intuitive sense since it does not seem likely that an 

optimal evader strategy will depend upon information which 
the pursuer already knov/s at the time of fire. The known 
optimal strategies for the one and two-step games adhere to 
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the Markov Hypothesis. In the one-step game the optimal 
evader turns or continues straight with equal probability, 
therefore independent of all previous moves. (i.e. P(S) = 
P(T) = P(L) = P(R)). In the two-step game the optimal 
evader uses a strategy where the probability of turning (or 
continuing straight) depends only upon his previous move 
(i.e. P(S) = P(L|L) = P(R|R) = 0.61803 and P(T) = P(L|R) = 
P(R|L) = 0.38197) . 

The Markov Hypothesis will now be applied to the 
three-step game. Since the evader will condition his next 
move upon his previous two moves, his strategy can be 
described by a 2x2 transition matrix as shown in Figure 2.7. 
The state of the evader at any time is S or T since this 
state is a function of the evader's last two moves (i.e. LL 
or RR-^S) . In this transition matrix: 

= P(Next state is S | Last state was S) 
q^ = P(Hext state is S | Last state was T). 

The four achievable positions for the evader in the three- 
step game and the associated maneuver sequences are shown in 
Figure 2.8. Let the variable W represent the final position 
of the evader three steps after the time of fire; from 
Figure 2.8 it can be seen ¥£(1,2,3. 4). Let the variable 
STATE represent the state (S or T) that the evader occupies 
at the time of fire. The probability that the evader 
occupies any final position is a function of q and q when 
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Figure 2.7 Markov Hypothesis Transition Matrix for 
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conditioned upon his initial state. For example, given 
STATE=S, to arrive at ¥=1 , the sequence of transitions under- 
gone must be: 



S to T to S to S 

The probability of this occurrence can be written: 

P(W=1 I STATE=S)=(l-q^ )p 2 q^ 

The remaining seven conditional probabilities are: 

P(H=2| STATE=S) = (1-q^ ) P 2 ( 1 -q-, ) + (1 -q^ ) ( 1 -q 2 ) ^ +q-, ( 1 -q-, )q 2 
P(W=3|STATE=S) = (1-q^ ) ( 1 -q 2 ) q 2 + q^ 0 -q ) (1 -q 2 ) ^ -^-1 ) 
P(br=4| STATE=S) =q^ ^ 

P(W=1 I STATS=T)=(1-q2)q2qT 

P(¥=2| STATE = T) = (l-q2)q2(1-q-, ) + (1-q2)'+q2^^“'^1^'^2 
P(W=3| STATE=T) = (1-q2)'q2 + q2(^-q-l ) (1 -q2) +q2^i ^ ^ -'ll ) 

P(!'J=4|STATE=T)=q2q-,^ 

At any time the pursuer may choose to fire, he knows 
which of the two states (S or T) that the evader is in by 
observing his last two moves. The optimal values of q^ and 
q 2 under this strategy are found by solving the following 
non-linear problem: 

min [max { P(w= j |STATS=i) }~) 

q^l2 j=1,2,3,4 

i=S,T 
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The solution, due to Ferguson, is = 0.63397. . . , ~ 

0 . 73205 . . . with a corresponding game value of 0.29423. the 

resulting matrix of conditional probabilities is shown in 
Table I. Ferguson states when presenting this evader 
strategy, that it is not known to be optimal and in fact he 
conjectures that no evader strategy of finite dependence is 
optimal for the evader. The strategy of Bram presented in 
the next section will show that indeed an evader strategy of 
infinite dependence does result in a tighter bound on the 
game value. 

2 . Infinite Dependence Strategy 

As mentioned in Chapter I, the best existing evader 
strategy for the three-step game was described by Joseph 
Bram. This strategy can be described as an infinite sequence 
of the conditional probabilities that the evader will con- 
tinue straight given the state S of his previous moves. If 
the previous move by the evader was a turn, the evader is in 
state S = 1 , while if the previous k-1 moves have been straight 
the evader is in state S=k. (Note that the state space of S 
is infinite). We will denote a turn by T and a straight by 
S as before. At each time step the evader continues straight 
or turns with a probability dependent upon his state S. Let: 



Pi. = P(Straight I S=k) . 

If the evader is in state k at some time n, at time n+3 the 
evader can be in one of four positions described by W 
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TABLE I 



P(W=WI STATE) for Three-Step Markov Hypothesis Strategy 



W= 

STATE 

S 

T 



= F(S|S) = 0.63397 
= P(S|T) = 0.73205 



12 3 4 

.16987 .29423 .28109 , .25480 

.12435 .28719 .29423 .29423 
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previously. There are eight possible 3-bit sequences of S's 
and T’s \-/hich correspond to the four possible terminal 
positions as shown in Figure 2.8. The probabilities associa- 
ted with each position W given k are as follows: 

P(W=1 lS=k) = (1-pj^)p^P2 

P(W=2|S=k) = (l-p,^)p^(1-P2) + (l-Pi^)(l-p^ + )p^ 

P(W=3| S=k) = (1-p^) (1-p^ )p^+Pj^(l-Pj^^^ ) (1-P^ )^Pk^k + 1 ^''■Pk+2^ 
P(W.4|S-k)=p^p^^^p^^2 

If the evader fires at time n, at position W, when S=k, his 
expected payoff will be: 

P(W=V/|S=k) 

The upper bound on the value of the game played with this 

# 

strategy is: 

MAX MAX {P(W=W|S=k)} 

/N 

k ¥ 

The evader of course will attempt to select his infinite 
array of so as to minimize the above bound which is the 

maximum payoff that the pursuer can achieve. The best set 
of ' s found by Bram is delineated in Table II, while the 

A 

resulting P(W=W|S=k) is shown in Table III. The upper 
bound on the game value under this specific set of Pj^'s is 
the maximum value found in Table III or 0.28903. In this 
strategy the decision to turn or continue straight has a 
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TABLE II 



A Safe Set 



k 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 
1 1 
12 

13 

14 

15 



of D, ' s for 





the Evader, 



Pk 

.69290 

.62467 

.66775 

.65137 

.66241 

.65859 

.66135 

.66047 

.66116 

.66096 

.66114 

.66109 

.66114 

.66113 

.66114 



TABLE III 



W= 



k 

1 



10 

11 

12 

13 



A 

P(W=W|S=k) using P |. * s of Table II 

1234 



.13292 


.28903 


.16246 


.27682 


.14381 


.27905 


.15090 


.27591 


.14612 


.27634 


.14778 


.27552 


.14658 


.27560 


.14696 


.27539 


. 1 4666 


• .27539 


.14675 


.27534 


.14667 


.27534 


.14669 


.27532 


.14667 


.27532 



28903 


.28903 


28903 


.27170 


28903 


.28818 


28903 


.28417 


28903 


.28852 


28903 


.28768 


28903 


.28880 


28903 


.28863 


28903 


.28892 


28903 


.28889 


28903 


.28896 


28903 


.28896 


28903 


.28898 
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dependence upon the previous moves. That dependence may 
extend infinitely far back; thus the evader is required to 
maintain the infinite array of to execute this near- 

optimal strategy. 

3 . Sub-Markov Strategy 

The strategy presented here is due to Bouchoux 
Ref.C^] and is characterized by a strategy where the evader's 
sequence of moves is not Markovian in itself but one in 
which that sequence is generated by a substructure which is 
Markovian, hence the description Sub-Markov. This form of 
strategy is suggested by its use in providing optimal 
strategies in emission-prediction games described by 
Blackwell Ref .[,7 ] and Matula Ref.Cs], The pursuer-evader 
game, while similar to emission-prediction games, is compli- 
cated by the fact that there are several distinct sequences 
of moves which lead to the possible terminal positions. 

Since the pursuer (predictor) must fire at one of those ter- 
minal points and not at a specific sequence of moves, the 
game is more complex. Bouchoux describes a strategy based 
upon three states. A, B and C, through which the evader 
transitions in a Markovian manner. When in state A the 
evader always turns, while in states B and C he always goes 
straight. After each move, straight or turn, the evader 
transitions between states according to a 3x3 transition 
matrix and is ready for his next move. This strategy is 
finite in the memory required by the evader and Bouchoux 
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obtained a bound on the game value of 0.28922 by optimizing 
upon the transition matrix. 
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III. EXTENDED MARKOV STRATEGY 



A. MOTIVATION AND DESCRIPTION 

The evader strategy to be investigated will be called 
Extended Markov because it is an extension of the finite 
dependence of the Markov Hypothesis strategy. The depen- 
dence will be finite but will extend beyond the previous n-1 
steps. In the Markov Hypothesis strategy, for the three- 
step game, discussed in H.D."!., the best strategy for the 
evader resulted in an upper bound on the game value of 
0 . 29423 . If the dependence is restricted to only the pre- 
vious move instead of the previous two moves the best 
strategy results in an upper bound of 0.29630 (Note: this 

is equivalent to adding the constraint 9-]=92 non- 

linear problem described in II.D.1. with a solution at 
q^=q2=2/3). Since Bram's strategy showed that the Markov 
Hypothesis was not optimal for the three-step game, it seems 
that a Markovian strategy where the dependence is finite but 
extends beyond the last n-1 moves might result in a tighter 
bound on the game value than previously obtained. This is 
the class of strategies to be called Extended Markov. These 
strategies for the three-step game, Markovian in nature, 
will arise from a dependence upon the last three or more 
moves and will be called the n-dependent strategies where n 
represents the level of dependence. In this context, the 
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Markov Hypothesis strategy for the three-step game is the 
two-dependent strategy. 

B. GENERAL N-DEPENDENT STRATEGY 

In the n-dependent strategy the evader will determine 
his next move based upon his previous n moves. The evader 
can be thought of as controlling 2^ variables, each being 
the probability of going (say) right given the previous n 
steps have been in a certain sequence. We will utilize the 
left-right symmetry of the problem by considering only paths 
where the last move is to the (say) right, resulting in only 
variables, each representing the probability of going 
(say) straight given the last n steps have produced a 
certain n-1 bit sequence of straights and turns. The general 
n-dependent strategy can be described by a Markov chain 
having 2^”^ states corresponding to the 2^ different 
n-1 bit sequences of straights and turns which are possible 
based on the last n moves (i.e. conditioning upon the last 
n moves is equivalent to conditioning on the last n-1 
straights or turns). From each of the 2^'^ states there is 
a fixed probability that the evader will maneuver to one of 
the four final positions ¥ in the next three steps. A 2^"^ 

X 2^^ ^ transition matrix will be used to describe the condi- 
tional probability of turning or continuing straight given 
the current state ((n-l)-bit sequence). Since the state 
describes the previous n moves in terms of straights and 
turns only two possible transitions exist from each of the 
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states. The first n-2 bits of the state transitioned to are 
determined by the last n-2 bits of the state transitioned 
from; the last bit will be S or T depending upon the new 
move. Due to this structure the transition matrix will be 
completely defined by 2^ ^ variables (called i = 1 , 2^^ ) 

which represent the probability of continuing straight given 
the current state. The other transition probability for 
that state will obviouslv be ( 1 -a . ) . Usine’ a transition 



matrix so constructed, the cond 
ending in one of the four final 
be found. In order to arrive i 
sequence of states transitioned 
ting three-bit sequence, TSS, a 

A 

Thus P(M=W| STATE) is a function 
2 '^”^ ) and the best n-dependent 
following non-linear program: 

min [ MAX 

A 

q^ W, STATS 

s . t . 0<q .<1.0 

— ^1— 

For general n, it is seen that 
minimizing the maximum of 2^ 
non-linear functions of up to 
solution has been found and in 
solutions will be found by non 



"1 

tional probability of 
positions (¥=1,2,3 or 4) can 
, position 1, for example, the 
must result in the termina- 
can be seen from Figure 2.8. 
of the variables q^ (i=1, 
trategy is solved by the 

P(W=W|STATE)J 
i = 1 ,2 

the above program involves 

>1 

(4 positions x 2^~ states) 
n 1 

variables. No analytic 
later sections near-optimal 
linear search techniques. 
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C. THREE-DEPENDENT STRATEGY 



The first extension of the Markov Hypothesis strategy is 
the three-dependent strategy described by four states (SS, ST, 
TS, TT) and a 4x4 transition matrix shown in Figure 3.1 
where ; 



= P(next move is straight [ State is SS) 
or equivalently; 

q^ = P(next state is SS | last state was SS) 

The sixteen conditional probabilities of terminating in one 
of the four positions W, given the evader starts from one of 
the four states are listed in Table IV, The best solution 
found using the three-dependent strategy gives an upper 
bound on the game value of 0.28964 when: 



q^ = 0.66163 q^ = 0.62489 

= 0.70054 q^ = 0.70054 



The matrix of conditional probabilities evaluated at this 
point are in Table V. This solution was found by utilizing 
an improved feasible direction search which was started from 
a known "good" solution. For the three-dependent strategy a 
good starting point is found by applying the known two- 
dependent (Markov Hypothesis)' solution to the three- 
dependent structure. If one applies the restriction q^ =q^ 
and 92“*^4 three-dependent strategy, it is equivalent 
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NEXT STATE 







SS 


ST 


TS 


TT 




SS 


^1 


1-q^ 


0 


0 


LAST 


ST 


0 


0 


^2 


1-qa 


STATS 


TS 


^3 


1 


0 


0 




TT 


0 


0 


^4 





4x4 Transition Matrix for 3-Dependent 
Strategy . 



Figure 3.1 
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TABLE IV 



P(W=W| STATE) for 3-Dependent 



Notation: 



1 -q. i= 



P(W=1 


SS) 


P(W=2 


SS) 


P(W=3 


SS) 


II 

>• 


SS) 


P(W=1 


ST) 


P(W=2 


ST) 


P(W=3 


ST) 


P(W=4 


ST) 


P(W=1 


TS) 


P(W=2 


TS) 


P(W=3 


TS) 


II 


TS) 


P(W=1 


TT) 


P(W = 2 


TT) 


P(W=3 


TT) 


P(W=4 


TT) 



P^q2q3 

Piq2?3 + P 1 P 2 P 4 
PlP2^4 ^ 'I 1 P 1 P 2 

P2'54'»3 

Pjq^Pj + pjp^p^ 

P2P4‘‘4 ^ P 2 P 3 P 2 

°-2'^3Pl 

P3‘’2P3 

P3‘l2P3 ^ P 3 P 2 P 4 

P3Pjq4 + qjPiPj 

l3'4l'll 

P 41413 

P 414 P 3 P 4 P 4 P 4 
P 4 P 414 ^ 14 P 3 P 2 
‘l4‘l3’l 



Strategy 

,2,3,4 

qi p^q 2 

q>l q>| 

‘^2P3'^2 

P2^3Pl 

q3Piq2 
q ^ q -j p 



^ 4 ^ 3^2 

q4q3Pl 
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TABLE V 



Good Evader Strategy In 3-Dependent Case 





^2 


= P(S 


SS) 


= 0.66163 






= P(S 


ST) 


= 0.70054 






= P(S 
= P(S 


TS) 

TT) 


= 0.6248-9 
= 0.70054 








P(W= 


A 

=w] STATE) 




w= 


1 




2 


3 


4 


STATE 












SS 


.14812 


.27609 


.28615 


.28964 


ST 


.13109 


.28964 


.28964 


.28964 


TS 


.16421 


.28033 


.28191 


.27355 


TT 


.13109 


.28964 


.28964 


.28964 






TABLE 


VI 





Good Evader Strategy in 4-Dependent Case 



q, = P(s 

q' = P(S 


SSS) = 


0.65931 




= P(S 


TSS) 


= 0.66543 


SST) = 


0.69579 


97 

^^8 


= P(S 


TST) 


= 0.69579 


q, = P(s 


STS) = 


0.62474 


= P(S 


TTS) 


= 0.62474 


q| = P(S 


STT) = 


0.69579 


= P(S 


TTT) 


= 0.69579 






P(W=W] STATE) 








A 

w= 


1 


2 




3 




4 


STATE 














sss 


.14809 


.27677 




.28854 


.28659 


SST 


.13224 


.28925 




.28925 


.28925 


STS 


.16312 


.27814 




.28465 


.27409 


STT 


.13224 


.28925 




.28925 


.28925 


TSS 


.14543 


.27606 




.28925 


.28925 


TST 


.13224 


.28925 




.28925 


.28925 


TTS 


.16312 


.27814 




.28465 


.27409 


TTT 


.13224 


.28925 




.28925 


.28925 
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to the strategy discussed in II.D.1. with an upper bound 
of 0.29423 when: 



Analogously any near-optimal solution to the n-dependent 
strategy will provide a "good" initial solution to the 
(n+1 ) -dependent strategy. While the solution given above 
for the three-dependent strategy is not known to be optimal, 
but rather a local minimum of the problem described in 
III.B., it does represent a significant improvement over the 
two-dependent strategy (0.29423) and is close in value to 
the infinite strategy of Bram (0.28903). Appendix A pre- 
sents an analysis of the above three-dependent solution and 
shows that the proposed solution does satisfy first-order 
Kuhn-Tucker conditions (necessary but not sufficient) for a 
global minimum. It is interesting to note that in the 



Additionally in order for the pursuer to receive his maximum 
achievable payoff he must refrain from attacking when the 
state is TS or be limited to a payoff of 0.28191. 

D. FOUR AND FIVE-DEPENDENT STRATEGIES 

The treatment of the four-dependent and five-dependent 
strategies is equivalent to the previously described three- 
dependent strategy with the expansion of the state space and 



= 0.63397 



92 = = 0.73205 




P(S|ST) = P(S|TT). 
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1 











number of variables involved to. eight .and sixteen 
respectively. Good solutions to the four and five-dependent 
strategies were found, as in the three-dependent case, by 
starting at a known near-optimal set of values for the ' s 
and conducting an improving feasible direction search until 
a local minimum was found. The best solutions thus found to 
the four and five-dependent strategies and the resulting 
conditional probability matricles are shown in Tables VI and 
VII. 

E. CHARACTERISTICS OF THREE, FOUR AND FIVE-DEPENDENT 

STRATEGIES 

The solutions found for the three, four and five- 
dependent strategies, outlined in Tables V, VI and VII show 
several revealing characteristics. In each case the condi- 
tional probability of continuing straight given the n-1 bit 
state is not dependent upon all of the information contained 
in that n-1 bit sequence. The probabilities are dependent 
only upon the number of time steps elapsed since the last 
turn maneuver and not upon any turn- straight information 
further in the past than that last turn. For example, 
letting t denote the number of time steps since the last 
turn, then in the five-dependent solution: 



^2 ^4 P(S|t-l) 



^11 "‘^15 

q5=q-i3 



= P(S|t=2) 
= P(SIt=3) 



33 



TABLE VII 



Good Evader Strategy in 5-Dependent Case 



^2 



^6 

^7 

^8 



P(S 

P(S 

P(S 

P(S 

P(S 

P(S 

P(S 

P(S 



ssss) 

SSST) 

SSTS) 

SSTT) 

STSS) 

STST) 

STTS) 

STTT) 



0.66120 

0.69385 

0.62470 

0.69385 

0.66698 

0.69385 

0.62470 

0.69385 



^10 

^11 

^12 



14 _ 



^16 



P(S 

P(S 

P(S 

P(S 

P(S 

P(S 

P(S 

P(S 



TSSS) 

TSST) 

TSTS) 

TSTT) 

TTSS) 

TTST) 

TTTS) 

TTTT) 



P(W=W| STATE) 



0.65034 

0.69385 

0.62470 

0.69385 

0.66698 

0.69385 

0.62470 

0.69385 



w= 

STATE 


1 


2 


3 


4 


SSSS 


.14685 


.27541 


.28867 


.28907 


SSST 


.13270 


.28910 


.28910 


.28910 


SSTS 


.16267 


.28569 


.28910 


.27097 


SSTT 


.13270 


.2891 0 


.28910 


.28910 


STSS 


.14435 


.27975 


.28910 


.28680 


STST 


.13270 


.2891 0 


.28910 


.28910 


STTS 


.16267 


.28569 


.28910 


.27097 


STTT 


.13270 


.28910 


.28910 


.28910 


TSSS 


.15156 


.27670 


.28742 


.28432 


TSST 


.13270 


.28910 


.28910 


.28910 


TSTS 


.16267 


.28569 


.28910 


.27097 


TSTT 


.13270 


.28910 


.28910 


.28910 


TTSS 


.14435 


.27975 


.28910 


.28680 


TTST 


.13270 


.2891 0 


.28910 


.28910 


TTTS 


.16267 


.28569 


.28910 


.27097 


TTTT 


.13270 


.28910 


.28910 


.28910 
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= P(Slt=4) 
= P(Slt>4) 



^9 

It is hypothesized that this characteristic holds for the 
optimal form of any n-dependent strategy. If this is so it 
can be seen that the n-dependent strategy is a finite (trun- 
cated) version of the Bram strategy presented in II. D. 2. and 
as the level of dependence n is increased without bound the 
bound of 0.28903 of Bram is expected to hold. 

Each of the investigated strategies is also characterized 
by having some states in which the evader must refrain from 
firing, else he forfeits his ability to maximize his payoff. 
As the level of dependence increases however, the penalty to 
the pursuer ^^^ho fires when the evader is in one of these 
states diminishes. Table III* shows that under Bram's 
strategy there is no time at which the pursuer cannot 
achieve his maximum payoff given he always fires at position 
W=3. 
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IV. FOUR-STEP GAME 



« 



The four-step pursuer-evader game has been the subject 
of little interest due to the unsolved nature of the three- 
step game. We shall briefly look at the four-step game and 
discover that the apparent characteristic structure of the 
three-step extended Markov strategies does not extend to the 
four-step game. Given a four-step time delay betveen the 
attacker’s time of fire and subsequent detonation, the evader 
may achieve five different positions through the sixteen 
different four-bit sequences of turns and straights as shown 
in Figure 4.1. The Markov Hypothesis strategy solution to 
the four-step game is due to Washburn Ref.f9]. In the four- 
step game the Markov Hypothesis has dependence extending to 
the last three moves, the best strategy under this hypothesis 
bounds the value of the game to 0.23740 or below, the q 
values and resulting conditional probability matrix is shown 
in Table VIII. The first extended Markov strategy of the 
four-step game, the only one investigated, is the four- 
dependent strategy; in this strategy dependence reaches back 
to the last four moves. The best solution found using the 
four-dependent strategy is shown in Table IX and provides an 
upper bound of 0.23734* While this is an improvement over 
the Markov Hypothesis solution of Washburn, the improvement 
is very slight. Additionally, no underlying characteristic 
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such as discussed in 
Markov strategies is 
dependent strategies 



III.E. for the three-step extended 
apparent from the three and four- 
investigated for the four-step game. 
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TSSS 



TSST 


TSTS 


TTST 


TTSS 


TSTT 


TTTT 


STTT 


STTS 


TTTS 


STST 


SSTS 


SSTT 


STSS 






SSST 



SSSS 



Figure 4.1 



Achievable Evader Positions in Four-Step Game 



TABLE VIII 





Markov-Hypothesi 3 Strategy for 


Four-Step Game 






^2 


= 0.69681 
= 0.69681 


^3 

^^4 


= 0.70169 
= 0.69675 








P(W=W |STATE) 






A 

w= 

STATE 


1 


2 


3 


4 


5 


SS 


.10330 


.18677 


.23739 


.23678 


.23575 


ST 


.10329 


.18511 


.23709 


.23710 


.23740 


TS 


.10163 


.18615 


.23740 


.23740 


.23740 


TT 


.10331 


.1 8512 


.23709 


.23710 


.23738 






TABLE 


IX 








Three-: 


Deoendent Strategy to : 


Four-Step Game 






2l 

92 

*^3 

^5 


= 0.69724 
= 0.69727 
= 0.70466 

= 0.69654 


^5 

^7 

^8 


= 0.69728 
= 0.69727 

= 0.70469 

= 0.69724 








A 

P(W=W| 


STATE) 






A 

w= 

STATE 


1 


2 


3 


4 


5 


SSS 


.10306 


.18769 


.23624 


.23668 


.23634 


SST 


.10294 


.18508 


.23733 


.23731 


.23733 


STS 


.10053 


.18828 


.23654 


.23733 


.23732 


STT 


.10329 


.18518 


.23731 


.23712 


.23709 


TSS 


.10457 


.18826 


.23622 


.23612 


.23482 


TST 


.10294 


.18508 


.23733 


.23731 


.23733 


TTS 


.10052 


.18827 


.23654 


.23733 


.23733 


TTT 


.10306 


.18509 


.23731 


.23721 


.23733 
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V. CONCLUSIONS AND REMARKS 



The three-step pursuer-evader game remains unsolved. 

The investigation of the extended Markovian strategies has 
been shown to result in improved evader strategies over the 
Markov Hypothesis but is not known to provide a better 
strategy than the infinite memory strategy of Bram; in fact 
it is hypothesized that the n-dependent extended Markov 
strategy to the three-step game represents a finite approxi- 
mation to the strategy of Bram. In this respect the results 
are not entirely disappointing in that they provide a finite 
strategy which appears to converge rather rapidly to a 
strategy equivalent to Bram's infinite memory strategy. The 
five-dependent strategy to the three-step game relies upon 
five distinct variables: 

q-] qq q3 q9 

which provide an upper bound 0.28910 which is reasonably 
close to the bound of 0.28903 provided by Bram's infinite 
strategy. The near-optimal extended Markov strategies 
presented in Tables V, VI, and VIII represent local minima to 
the non-linear programming problem discussed in III.B. 

While these can be seen to represent improvements from the 
Markov Hypothesis strategy they may not be the globally 
minimum strategies within the extended Markov structure. As 
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the level of dependence in the extended Markov strategies 
increases the mathematical complexity increases dispropor- 
tionately; only the apparent characteristic of these extended 
Markov strategies, discussed in III.E. makes them remotely 
attractive . 

It still remains to be answered why the three-step game 
is apparently non-Markovian in its optimal evader strategy 
while "^he one and two-step games are Markovian. The evader 
strategy proposed by this thesis as well as the strategy 
described by Bouchoux represent abstractions from the strict 
Markov Hypothesis solution and although both strategies 
represent a lowering of the pursuer's maximum payoff, 
neither is as tight as the infinite strategy of Bram which is 
strictly non-Markovian in nature. While improved finite 
strategies may be possible by further abstraction from a 
strictly Markovian strategy, it has been conjectured that no 
finite strategy is optimal for the evader. This is known to 
be true for the pursuer since he must observe the evader for 
an ever-increasing length of time if he wishes to achieve 
optimality (with the exception of the one-step game where 
both sides have finite optimal strategies). Bouchoux 
suggests that a generalization of hfs sub-Markov strategy, 
involving three distinct Markov states each with some fixed 
probability of generating a straight or a turn, might provide 
a tighter bound on the game value due to its further 
abstraction from a Markov behavior. However, the mathematical 



41 



complexity of locating optimal or near-optimal strategies 
within this framework is considerable. 

The four-step game appears even more difficult. The 
Markov Hypothesis solution is shown to be a sub-optimal 
strategy, being dominated by the three-dependent extended 
Markov strategy of Table IX. The strategies found to the 
four-step game in Tables VIII and IX appear to preclude an 
extension of Bram's infinite strategy to the four-step game. 
The apparent dissimilarity between the known near-optimal 
evader strategies from the two to three to four-step games is 
perplexing . 

The discrete evasion game upon a two or three dimensional 
surface is another area which holds promise for future 
research. The work of Ferguson solves the two-step game for 
a special class of graphs he calls restricted n-graphs; 
however the two-step game upon more general two-dimensional 
surfaces, as well as the three-step game, are unsolved. 

The discrete pursuer-evader game, as described by Isaacs 
in 1954, was generated as a simplification of a much more 
complex problem. The continuing mystery surrounding all but 
the simplest of these "simplified" games provides a wealth 
of opportunity and motivation for future research. 
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APPENDIX A 



INVESTIGATION OF THE THREE-STEP EXTENDED MARKOV STRATEGY 

In III.B., the general n-dependent extended Markov 
strategy was presented. The best solution found for the 
case n=3 is given in Table V. As stated earlier, this solu- 
tion is not known to be optimal but can be shown to satisfy 
the first-order Kuhn-Tucker conditions (necessary but not 
sufficient) for a global minimum. 

For the three-dependent case the problem may be stated 
as follows: 



There are sixteen separate functions (see Table IV), from 
which the maximum will be selected by the pursuer's choice 



of W and STATE (i.e. by his selection of aim point and time 



this maximum payoff. Let f^ , ...» f-|^ represent the 

sixteen functions described in Table IV, then the problem 
becomes : 



min 





W, STATE 

s . t . 0 . 0£q^£l .0 i = 1 , 2 , 3 , 4 



of fire), the evader must select the q^^ ' s so as to minimize 



min 






A 



q^ W, STATE 



s.t. 0.0£q^_<1.0 i = 1,2,3»4 
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Introducing a dummy variable , the above non-linear 
program may be equivalently written; 



min q^ 



s . t . 


f. - 

J 


^5 - 


0.0 


j=l-l6 




‘li - 


1 .0 < 


O 

• 

O 


1 

11 

•H 




^i 


> 


o 

• 

o 


i 

II 

— i 

1 



The structure of this problem allows some additional 
conditions to be placed upon the optimal solution; 



0.0 < q. < 1.0 



i=1 .2,3,4. 



Close inspection of the functions, f., show that if 



q. =0.0 



or 



p. = 1 .0-q. =0.0 



then at least one of the f.'s will have a value of 0.0. If 

any f.=0.0 then the remaining three f.'s associated with the 
J <3 

same initial state must sum to 1.0, since for any initial 
state : 



P(W=1 ,2,3 or 4| STATE) =1.0 



The minimum of the maximum of three non-negative numbers 
which sum to 1.0 must be at least 1/3, which is greater than 
the known upper bound on the value of the game. Therefore: 

0.0 < q. < 1.0 i=1-4 

^1 
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Based upon the above characteristic of the problem the 
constraints ; 



- 1 .0 < 0.0 i=1-4 

will not be binding at the optimal solution and may be 



dropped without consequence, 


resulting in: 




min q^ 






s . t . f . - 

J 


q^<_0.0 j=1-l6 




<^i 


o 

• 

0 

H- 

II 

1 


The first- 


order Kuhn-Tucker 


conditions for the 


require that, at an optimal 


point, there exist 


such that: 




lit 

J 




It, = 


X. Iy = o.c 




q^ > 0.0 


X. < 0.0 

t] 




II 

•H 


j = 1-16 



where : 

L(q,A) = q^ - (f^. - q^) 

These conditions may be further modified: 
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0.0 



Hi 

3qi 



q- = 0.0 

3q^ 



q . > 0.0 



i = 1-5 

In the proposed near-optimal solution in Table V, seven 
of the sixteen inequality constraints are binding; that is: 

= fb = T? = fs = f '14 = £“'15 = ^ ‘^5 ^ 0.28964 

the remaining nine constraints are slack, it follows that: 

^2 ^3 ^5 ^9 ^10 ^11 ^12 ^ l 3 0.0 

The proposed solution must therefore satisfy the following 
conditions : 



= 0.0 

3q. 



A. < 0.0 

J - 



i = 1,5 

with the substitution of the values. 



j = 4,6,7,8,14,15,16 



q^ = 0.66163 ^2 " 0-70054 = 0.62489 q^ = 0.70054 

3 L 

the five constraints (t — = O.O), become a set of five 

*^i 

linear equations in seven unknowns (A,, A, , Ar, , Ao, A.,, A.^ 

4 o / o 14 Ho 

A^^)- Any solution to this set of equations which also 
satisfies the condition: 
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A. < 0.0 

J - 



j =4, 6, 7, 8, 14, 15, 16 



will satisfy the modified Kuhn-Tucker conditions. Using 
linear programing methods, such a set of X's was found, 
thereby verifying the satisfaction of the Kuhn-Tucker condi- 
tions at the proposed three-dependent strategy of Table V. 
The near-optimal solutions to the four and five-dependent 
strategies (Tables VI and VII) could be analyzed in a 
similar manner. 
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